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Abstract 



Q\ . In our recent work [49] we considered solving under-determined systems of linear equations with sparse 

solutions. In a large dimensional and statistical context we proved that if the number of equations in the 
system is proportional to the length of the unknown vector then there is a sparsity (number of non-zero ele- 
ments of the unknown vector) also proportional to the length of the unknown vector such that a polynomial 
l\ -optimization technique succeeds in solving the system. We provided lower bounds on the proportionality 
constants that are in a solid numerical agreement with what one can observe through numerical experiments. 
Here we create a mechanism that can be used to derive the upper bounds on the proportionality constants. 
Moreover, the upper bounds obtained through such a mechanism match the lower bounds from [49] and 
ultimately make the latter ones optimal. 
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1 Introduction 



Index Terms: Linear systems of equations; l\ -optimization; compressed sensing . 



We start by defining what the problem of interest will be (the same of course was the problem of interest 
in [49]). We will be interested in finding sparse solutions of under-determined systems of linear equations. 
In a more precise mathematical language we would like to find a /c-sparse x such that 

A X = y (1) 

where A is an m x n (m < n) matrix and y is an m x 1 vector (see Figure 1 ; here and in the rest of the 
paper, under A;-sparse vector we assume a vector that has at most k nonzero components). Of course, the 
assumption will be that such an x exists. 

To make writing in the rest of the paper easier, we will assume the so-called linear regime, i.e. we will 
assume that k = f3n and that the number of equations is m = an where a and (3 are constants independent 
of n (more on the non-linear regime, i.e. on the regime when m is larger than linearly proportional to k can 
be found in e.g. [11,27,28]). 

We generally distinguish two classes of possible algorithms that can be developed for solving (1). The 
first class of algorithms assumes freedom in designing matrix A. If one has the freedom to design ma- 
trix A then the results from [3, 38,42] demonstrated that the techniques from coding theory (based on the 
coding/decoding of Reed-Solomon codes) can be employed to determine any /c-sparse x in (1) for any 
< a < 1 and any f} < ^ in polynomial time. It is relatively easy to show that under the unique recov- 
erability assumption f3 can not be greater than ^. Therefore, as long as one is concerned with the unique 
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Figure 1 : Model of a linear system; vector x is /c-sparse 



recovery of A;-sparse x in (1) in polynomial time the results from [3,38,42] are optimal. The complexity of 
algorithms from [3,38,42] is roughly 0(n 3 ). In a similar fashion one can, instead of using coding/decoding 
techniques associated with Reed/Solomon codes, design the matrix and the corresponding recovery algo- 
rithm based on the techniques related to the coding/decoding of Expander codes (see e.g. [34, 35, 55] and 
references therein). In that case recovering x in (1) is significantly faster for large dimensions n. Namely, 
the complexity of the techniques from e.g. [34,35,55] (or their slight modifications) is usually 0(n) which 
is clearly for large n significantly smaller than 0(n 3 ). However, the techniques based on coding/decoding 
of Expander codes usually do not allow for j3 to be as large as ^. 

The main interest of this paper however will be the algorithms from the second class. Within the second 
class are the algorithms that should be designed without having the choice of A (instead the matrix A is 
rather given to us). Designing the algorithms from the second class is substantially harder compared to the 
design of the algorithms from the first class. The main reason for hardness is that when there is no choice in 
A the recovery problem (1) becomes NP-hard. The following two algorithms (and their different variations) 
have been often viewed historically as solid heuristics for solving (1) (in recent years belief propagation type 
of algorithms are emerging as strong alternatives as well): 

1 . Orthogonal matching pursuit - OMP 

2. Basis pursuit - 1\- optimization. 

Under certain probabilistic assumptions on the elements of A it can be shown (see e.g. [41, 51, 52]) that 
if m = 0{k\og{n)) OMP (or slightly modified OMP) can recover x in (1) with complexity of recovery 
0(n 2 ). On the other hand a stage- wise OMP from [24] recovers x in (1) with complexity of recovery 
0{n log n). Somewhere in between OMP and BP are recent improvements CoSAMP (see e.g. [40]) and 
Subspace pursuit (see e.g. [12]), which guarantee (assuming the linear regime) that the fc-sparse x in (1) can 
be recovered in polynomial time with m = O(k) equations. 

We will now further narrow down our interest to only the performance of l\ -optimization. (Variations 
of the standard l\ -optimization from e.g. [9, 10,46]) as well as those from [14,26,30-32,45] related to 
^-optimization, < q < 1 are possible as well.) Basic l\ -optimization algorithm finds x in (1) by solving 
the following ^i-norm minimization problem 

min ||x||i 

subject to j4x = y. (2) 

Due to its popularity the literature on the use of the above algorithm is rapidly growing. We below restrict 
our attention to two, in our mind, the most influential works that relate to (2). 
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The first one is [7] where the authors were able to show that if a and n are given, A is given and 
satisfies the restricted isometry property (RIP) (more on this property the interested reader can find in e.g. 
[1,4, 6, 7, 44]), then any unknown vector x with no more than k = f3n (where j3 is a constant dependent 
on a and explicitly calculated in [7]) non-zero elements can be recovered by solving (2). As expected, this 
assumes that y was in fact generated by that x and given to us. The case when the available y's are noisy 
versions of real y's is also of interest [7, 8, 33, 54]. Although that case is not of primary interest in the 
present paper it is worth mentioning that the recent popularity of l\ -optimization in compressed sensing is 
significantly due to its robustness with respect to noisy y's. (Of course, the main reason for its popularity is 
its ability to solve (1) for a very wide range of matrices A; more on this universality from a statistical point 
of view the interested reader can find in [22].) 

However, the RIP is only a sufficient condition for t\ -optimization to produce the /c-sparse solution of 
(1). Instead of characterizing A through the RIP condition, in [15, 16] Donoho looked at its geometric 
properties/potential. Namely, in [15, 16] Donoho considered polytope obtained by projecting the regular 
n-dimensional cross-polytope C™ by A. He then established that the solution of (2) will be the /c-sparse 
solution of (1) if and only if AC™ is centrally ^-neighborly (for the definitions of neighborliness, details 
of Donoho's approach, and related results the interested reader can consult now already classic references 
[15-18]). In a nutshell, using the results of [2,5,39,43,53], it is shown in [16], that if A is arandom m x n 
ortho-projector matrix then with overwhelming probability AC™ is centrally /c-neighborly (as usual, under 
overwhelming probability we in this paper assume a probability that is no more than a number exponentially 
decaying in n away from 1). Miraculously, [15, 16] provided a precise characterization of m and k (in a 
large dimensional context) for which this happens. 

It should be noted that one usually considers success of (2) in recovering any given /c-sparse x in (1). 
It is also of interest to consider success of (2) in recovering almost any given x in (1). We below make a 
distinction between these cases and recall on some of the definitions from [16, 17, 19,21,48,49]. 

Clearly, for any given constant a < 1 there is a maximum allowable value of j5 such that for any given 
/c-sparse x in (1) the solution of (2) is with overwhelming probability exactly that given /c-sparse x. We 
will refer to this maximum allowable value of (3 as the strong threshold (see [16]). Similarly, for any given 
constant a < 1 and any given x with a given fixed location of non-zero components and a given fixed 
combination of its elements signs there will be a maximum allowable value of j3 such that (2) finds that 
given x in (1) with overwhelming probability. We will refer to this maximum allowable value of /3 as the 
weak threshold and will denote it by f3 w (see, e.g. [48, 49]). 

In our own work [49] we provided a novel probabilistic framework for performance characterization 
of (2) (the framework seems rather powerful; in fact, we found hardly any sparse type of problem that the 
framework was not able to handle with almost impeccable precision). Using that framework we obtained 
lower bounds on (3 W . These lower bounds were in an excellent numerical agreement with the values obtained 
for /3 W in [16]. One would therefore be tempted to believe that our lower bounds from [49] are tight. In 
this paper we design a mechanism that can be used to compute the upper bounds on /3 W (as it was the case 
with the framework of [49], the new framework does not seem to be restricted in any way to the t\ type of 
sparsity). The obtained upper bounds will match the lower bounds computed in [49] and essentially make 
them optimal. We should as an important side point mention that in a companion paper [47]) we designed 
an approach that reveals even qualitative (not only numerical) agreement between the results from [49] and 
those from [16]. Obviously, using the results of [47] one can then also argue that the lower bounds of [49] 
are optimal. The point of the present work, though, should be viewed in a much broader context. It gives a 
general framework for bounding thresholds without relying on making them equivalent to the known optimal 
ones. Or in other words, it extends the range of applicability to the cases where the optimal ones may not 
be known. A collection of interesting applications of this framework will be presented in a few forthcoming 
companion papers. 

We organize the rest of the paper in the following way. In Section 2 we introduce two key theorems that 
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will be the heart of our subsequent analysis. In Section 3 we create the mechanism for computing the upper 
bounds on fi w in the case of general sparse signals x for a class of random matrices A. In Section 4 we 
will then specialize results from Section 3 to the so-called signed vectors x. Finally, in Section 5 we discuss 
obtained results. 



2 Key theorems 

In this section we introduce two useful theorems that will be of key importance in our subsequent analysis. 
First we recall on a null-space characterization of A that guarantees that the solution of (2) is the fc-sparse 
solution of (1). Moreover, the characterization will establish this for any /?n-sparse x with a fixed location 
of nonzero components and a fixed combination of signs of its elements. Since the analysis will clearly 
be irrelevant with respect to what particular location and what particular combination of signs of nonzero 
elements are chosen, we can for the simplicity of the exposition and without loss of generality assume 
that the components xi, X2, . . . , ^- n ~k of x are equal to zero and the components x n _jt + i, x n _£ + 2j . . . , x„ 
of x are smaller than or equal to zero. Moreover, throughout the paper we will call such an x fc-sparse 
and non-positive. Under this assumption we have the following theorem from [48] that provides such a 
characterization (similar characterizations can be found in [20,23,25,37,50,56,57]; furthermore, if instead 
of i\ one, for example, uses an t q -optimization (0 < q < 1) in (2) then characterizations similar to the ones 
from [20,23,25,37,50,56,57] can be derived as well [30-32]). 

Theorem 1. (Nonzero part ofx has fixed signs and location) Assume that an m x n matrix A is given. Let 
x be a k-sparse non-positive vector. Also let xi = X2 = • • • = x n _/% = 0. Further, assume that y = Ax. 
and that w is an n x 1 vector. If 

n n—k 

(Vw G R n \Aw = 0) Yj w * < l Wi ' (3) 

i=n—k+l i=l 

then the solution of (2) is x. Moreover, if 

n n—k 

(3w G R n \Aw = 0) w * > Y l Wi ' (4) 

i=n—k+l i=l 

then there will be a k-sparse nonnegative x that satisfies (1) and is not the solution of (2). 

Proof. The first part follows directly from Theorem 2 in [48]. For the completeness we just sketch the 
argument again. Let x be the solution of (2). We want to show that if (3) holds then x = x. To that end 
assume opposite, i.e. assume that (3) holds but x / x. Then since y = Ax and y = Ax one must have 
x = x + w with w such that Aw = 0. Also, since x is the solution of (2) one has that 

n n 

^ | - Xi - Wj| < ^ |xj|. (5) 

i=i i=i 

Now, the key point observed for the first time in Theorem 2 in [48] comes into play. By just simply "remov- 
ing the absolute values" on the last k elements of the sum on the left-hand side one has that the following 
must hold as well 

n—k n n 

^|-Xj-Wj|+ ^ (-Xj-Wj) < ^|xj|. (6) 

i=l i=n—k+l i=l 
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Since we assumed that Xj < 0, n — k + 1 < i < n, and Xj = 0, 1 < i < n — k, one from (6) obtains 

n— k n 
i=l i=n— 

or equivalently 

n— fc n 

|Wi| < Wi . 
i=l i=n— fc+1 

Clearly, (8) contradicts (3) and x / x can not hold. Therefore x = 
the theorem claims. 

For the "moreover" part assume that (4) holds, i.e. we assume 

n n—k 

(3w G R n |^w = 0) Wi > ^ |wi| (9) 

j=n— fc+1 i=l 

and want to show that there is a non-positive /c-sparse x such that (5) holds (with a strict inequality). This 
would imply that there is a non-positive x such that Ax. = y and x is not the solution of (2). Since (8) is just 
rewritten (9) one can go backwards from (8) to (5) (just additionally making all the inequalities strict in the 
process). The only problem will happen in a backward jump from (6) to (5). That jump will be fine pretty 
much for any non-positive /e-sparse x if all Wj's are negative. If some of them are not then one has to design 
a specific non-positive /c-sparse x for which jump from (6) to (5) is justified or in other words for which (6) 
and (5) are equivalent. To that end let us assume that Wj > for j G J , J C {n — k + 1, n — k + 2, . . . , n} 
and let J be such that J\J J = {n - k + 1, n - k + 2, . . . , n} and J f| J = 0. Then for x such that 
Xj : = for 1 < j < n — k, Xj = — Wj for j G J, and Xj < for j G J one has that (9) implies 

n n 

^ | - Xj - Wj| < ^ |xj|. (10) 
1=1 1=1 

or in other words that x can not be the solution of (2). This concludes the proof of the second ("moreover") 
part. □ 

Before proceeding further we would like to say a few words about the above theorem. In our opinion 
the first part of the theorem that was put forth in [48] is the unsung hero of all the success achieved in the 
thresholds analysis through various frameworks that we eventually designed. It was fist recognized in [48] 
that it could lead to the optimal performance characterizations of l\ -optimization. However, the analysis 
in [48] stopped somewhat short of the ultimate goal and it achieved only a moderate success in performance 
characterization of l\ -optimization (of course, not to take anything away from the strength of the first part of 
the above theorem, it was strong enough even back then when the framework of [48] was designed; it is just 
that the author of [48] was apparently not skilled enough to utilize it). Eventually, it was put to an ultimate 
utilization in [49]. While the success of the framework designed in [49] is a story on its own we feel that 
this simple "removing absolute values" observation made in Theorem 2 in [48] is a remarkable piece of the 
mosaic that makes everything work to perfection. 

Now, with regard to the second part of the above theorem, the story is somewhat similar. The first thing 
one should say is that the above proof of it is really nothing too original; it follows the well known "converse" 
strategy of the corresponding proofs when the absolute values are present (see, e.g. [23, 30]). It is just that 
we never presented this "remove the absolute values" observation in a converse way before. Basically, we 
did not find the second part of the theorem to be of any (let alone much) use if one were to create the lower 
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x which is exactly what the first part of 
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bounds on the thresholds. However, as the reader might guess, if one is concerned with proving the upper 
bounds the second part of the above theorem becomes the same type of the unsung hero that the first one 
was for the success of the framework of [49]. Below we use it to create a machinery as powerful as the one 
from [49] that provides the corresponding framework for upper-bounding the thresholds. 

Before moving to the design of the framework, we would also like to say a few words about a possible 
design of the matrix A that would satisfy the conditions of Theorem 1 . Designing matrix A such that (3) 
holds would not be that hard. The problem is that one does not know a priori which k components of x 
will be nonzero and which signs they will have. That would essentially force one to design A such that 
(3) holds for any subset of {1, 2, . . . , n} of cardinality k and any combination of signs on that subset. If 
one assumes that m and k are proportional to n (the case of our interest in this paper) this is an enormous 
combinatorial task and the construction of such a deterministic matrix A is clearly not easy (in fact, as 
observed in e.g. [49] one may say that it is one of the most fundamental open problems in the area of 
theoretical compressed sensing; more on an equally important inverse problem of checking if a given matrix 
satisfies the condition of Theorem 1 for any subset of {1, 2, . . . , n} of cardinality k and any combination of 
signs, the interested reader can find in [13, 36]). On the other hand, turning to random matrices significantly 
simplifies things. As we will see later in the paper, Gaussian random matrices A will turn out to be a very 
convenient choice. The following phenomenal result from [29] that relates to such matrices will be the key 
ingredient in the analysis that will follow. 

Theorem 2. ( [29]) Let X^ and Yy, 1 < i < n, 1 < j < m, be two centered Gaussian processes which 
satisfiy the following inequalities for all choices of indices 

1. E{X*) = E(Y?) 

2. E(XijXik) = E(YijYik) 

3. E(X ij X lk ) = E(Y ij Y lk ),i^l. 
Then 

P(f][j(X lJ > Xij)) < P(f)U(Y l3 > Xij)). 

i j i j 

3 Upper-bounding (3 W - general x 

In this section we probabilistically analyze validity of the null-space characterization given in the second 
part of Theorem 1. Essentially, we will design a mechanism for computing upper bounds on (3 W (in fact, 
since it will be slightly more convenient we will actually determine lower bounds on a; that is of course 
conceptually the same as finding the upper-bounds on (3). 

We start by defining a quantity r that will play one of the key roles below 

n— k n 

t{A) = min |w;| - ^ Wj) 

i=l i=n~k+l 

subject to Aw = 

||w|| 2 <l. (11) 

Now, we will in the rest of the paper assume that the entries of A i.i.d. standard normal random variables. 
Then one can say that for any a and f3 for which 

lim P(t(A) < 0) = 1, (12) 
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there is a fc-sparse x (from a set of x's with a given fixed location of nonzero components and a given fixed 
combination of their signs) which (2) with probability 1 fails to find. For a fixed /3 our goal will be to find 
the largest possible a for which (12) holds, i.e. for which (2) fails with probability 1. 

Before going through the randomness of the problem and evaluation of P(t(A) < 0) we will try to 
provide a more explicit expression for r then the one given by the optimization problem in (1 1). We proceed 
by slightly rephrasing (11): 

n—k n 

r(A) = min ( V t; - V w*) 

t,w — ' L — • 

1=1 i=n— k+1 

subject to — tj < Wj < tj, 1 < i < n — k 
Aw = 

||w|| 2 < 1. (13) 

Now we write a partial dual over t of the optimization problem in (13) (the strong duality trivially holds 
throughout the rest of this derivation) 

n—k n n—k n—k 

i=l i=n—k+l i=l i=l 

subject to A^ > 0, l<i<n-k 
xf ) > 0, 1 < i < n - k 
Aw = 

||w|| 2 < 1. (14) 
After regrouping the terms one has 

n—k n n—k n—k 



Wj 

1 i=n—k+l i=l i=l 



subject to A^ > 0, 1 < i < n — k 
xf ] >0,l<i<n-k 
Aw = 

||w|| 2 < 1. (15) 

To make the inner minimization over t bounded one must have (l — X^—X^) = Oforany 1 < i < (n — k). 
Replacing that back in (15) we obtain 



n—k n—k 



t(A) = max min — Wj + A,- 1 Wj — A 

A(D,A(2) w . *-f f-' 1 ^ 

i=n— k+1 1=1 i=l 



W,; 



subject to X\ ' > 0,1 < i < n — k 

A? } > 0, 1 < i < n - k 



1 - X\ L> - X^ = 0, 1 < i < n - k 



V W _ x(2) 
Aw = 

|| w|| 2 < I. (16) 
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We now write the remaining part of the dual over w (this part of the dual could have been written together 
with the first one over t; to make the expressions lighter we split the writing in two steps) 



n n—k n—k n 

t(A) = max min - Wj + A,- 1 ^ Wj — A,- 2 ^Wj + v 7 Aw + 7 Y^ w„ 2 — 7 

i=n— fe+1 i=l t=l t=l 

subject to A^ > 0, 1 < i < n — k 

\f ) >0,l<i<n-k 

1 - - \f ] = 0,l<i<n-k. (17) 
Further, using the last constraint one can get rid of one of the A's (say X^) and obtain 

n n—k n 

t(A) = max min - Wj + (1 - 2A| 2) ) Wj + v T Aw + 7 V* wf - 7 

AW.i/,7 w ■ i L1 1 ^! 

i=n—k+l i=l t=l 

(2) 

subject to A, • < 1 , 1 < i < n — k 
xf ] >0,l<i<n-k. 



(18) 



At this point we proceed by solving the inner minimization over w. To that end, let 

n n—k 



i=n— k+1 i=l i=l 



f 1 (X^,u, 1 ,w) = - w i + ^(l-2Af ) )w l + ^w + 7^w 2 - 7 . (19) 



Since /i(-) is convex in w we simply find the optimal w by equaling the derivative of /i(-) with respect to 
w to zero. We then have 



#i(A ( V, 7 , 



w 



= (l-2Af ) ) + i/ T A i + 2 7 w i ,l <i<n-k 
aw,- 



#i(A ( V,7, 



w 



= -1 + v T Ai + 27w i; n -k + l<i<n (20) 



dwi 

where as expected Ai is the i-th column of A. Let 

z = [(1 - 2A? >), (1 - 2A< 2) ), . . . , (1 - 2Ai 2 2 fc ), -1, -1, . . . , (21) 
From (20) one easily finds 

-z - ,4 T ^ 

w opt = . (22) 

2 7 

Removing the inner minimization over w in (18) and recognizing the relation between z and A^ 2 ) we have 

t{A) = max (z - A T v) T w opt + 7ll w o P t||i - 7 

z,i/,7 

subject to |zj| < 1, 1 < i < n — k. 

Zi = -l,n-k + l<i<n. (23) 
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Finally after plugging w opt from (22) in (23) we obtain 



\z-A T ^ 2 



or in a more convenient form 



t(A) = max — — — " " ' 2 — 7 

z,i/,7 47 

subject to |zj| < 1, 1 < i < n — k. 

Zi = -l,n - k+1 < i <n. (24) 

The maximization over 7 is then trivial and one finally has 

t(A) = max — llz — ^4 T z/||2 

subject to |zj| < 1, 1 < i < n — k 

Zi = -l,n - k + 1 < i < n. (25) 

t(A) = — min llz — A^VlU 

subject to |zj| <l,l<i<n — A; 

Zj = —1, n — k + 1 < i < n. (26) 

At this point we are almost ready to switch to the probabilistic aspect of the analysis. To that end we do the 
last piece of transformation. Namely, we rewrite (26) as 

t(A) = — min max a T (z — A T v) 

i,v ||a|| 2 = l 

subject to \zi\ < 1,1 < i < n — k 

Zi = -l,n - k + 1 < i < n. (27) 

Now we are ready to invoke the results from Theorem 2. We do so through the following lemma which is 
slightly modified Lemma 3. 1 from [29] (Lemma 3. 1 is a direct consequence of Theorem 2 and the backbone 
of the escape through a mesh theorem utilized in [49]). 

Lemma 1. Let A be an m x n matrix with i.i.d. standard normal components. Let g and h be n x 1 and 
mxl vectors, respectively, with i.i.d. standard normal components. Also, let g be a standard normal random 
variable and let Z be a set such that Z = (z|zj = — 1, n — k + 1 < i < n and |zj| < 1, 1 < i < n — k). 
Then 



P{ min max (-a T ^ T i/+||^|| 2 fl'-Ca,z^) > 0) > P( min max (\\v\\ 2 V] gjai+V" hji/j-C a z „) > 0). 

zeZ,i/efl"\0 ||a|| 2 =l z€Z,veR"\0 ||a|| 2 =l ^ ~y 

(28) 



Proof. The proof is exactly the same as the one of Lemma 3.1 in [29]. The only difference is that one should 
make identical copies over z of the processes X XjV , Y XtV defined in that proof. The rest of the proof remains 
unaltered. □ 

Let Ca,z,u = ^ Vn\W\\2 — a T z with > being an arbitrarily small constant independent of n. 



9 



Then the left-hand side of the inequality in (28) is then the following probability of interest 

n m 



P( min max (||^|| 2 y^giaj + y^hii/j - e^V^ll^lb + a T z) > 0). 
zez,veR"\o ||a|| 2 =i f-f t-f 

z=l ^=l 

After solving the inner maximization over a and pulling out ||i/|| 2 one has 

P{ min (||g + __ z || 2 + )rh iir ^--e^ ) > /n)>0). 
Minimization of the second term then gives us 



P{ min (||g + __z|| 2 )> llhlla + e^Vn). (29) 
z€Z,f€i? n \0 \\v\\2 

Since h is a vector of m i.i.d. standard normal variables it is rather trivial that P(||h||2 < (1 + e^)y/m) > 

1 — e _e 2 m where > is an arbitrarily small constant and e 2 is a constant dependent on but 
independent of n. Then from (29) one obtains 



P( min (||g+ ^— z|| 2 ) > ||h|| 2 + e^ } ^) 



z€Z,i/£R n \0 \\ u \\2 

> (1 _ e -4 m) ™)p( min (|| g + _L z || 2 ) > (i + e M)^ + e (^)^)). (30) 
zez,ueR n \o p|| 2 

Now, let g = [g(i),g( 2 ),...,g( n _ fc ),gn-fc+i,gn-fc+2,---,gn] T , where [g (1) , g (2) , . . . , g (n _ fe) ] are magni- 
tudes of [gi, g 2 , . . . , g ra _fe] sorted in increasing order. Then clearly, 

1 1 

min (||g+ — — z|| 2 ) = min (||g - — — z|| 2 ) (31) 
zez,fGi? n \o \\iy\\2 ze\z\,ueR n \o p|| 2 

where \Z\ = (z|zj = —1, n— < i < n and 0<Zj<l,l<i< n—k). Moreover, the optimization 
on the right-hand side of (30) is structurally the same as the one in equation (15) in [49] (actually to be more 
precise it is the same as the weak threshold equivalent to (15)). Essentially, the exact equivalence between 
these optimizations is achieved after in (15) from [49] h is replaced by g, v is replaced by A is restricted 
to the lower (n — k) components, and after one additionally notes that in (15) from [49] < A < v, which 
corresponds toO<Zj < 1,1 < i < n — k introduced above (that way one would in essence obtain the 
weak threshold equivalent to (15); this was not explicitly written anywhere in [49] but is rather obvious; 
in [49] we, instead, made a "weak" equivalence to its (29)). With these replacements one can then use the 
machinery of [49] to establish 



min (||g + — n-z|| 2 ) 
z€\Z\,is£R™\0 \\v U 



t g?- ((gr "-^* )a =/.M (32, 

=c w +l n Cw 



where c w is the solution of 

(£ T z) - E-=i Ei 



n 



g Cw - 03) 



As a side remark, we should point out that the key point to the success of our method is that the derivation 
of [49] establishes the equality in (32). It is just that in [49] only the "smaller than" inequality part of this 
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equality was utilized. At this point we have established the core of our upper-bounding arguments. The rest 
is just a slightly modified repetition of the derivations from [49] so that we can make everything precise. 
First we will define two quantities c$ and cL^ as the solutions of the following two equations: 



(i - 4 c) )£((g T z) - Eg; a) _ p-i ( (i + 



„(0 



(cUO' 

■w 



(0 
n - Cw 



n(l - fi w ) 



(i + 4 c) )a((g' J g)-E&g 

74 Cxu 



(«) 



(i 



e 2 



n(l - 



(34) 



where F a (■) is the inverse cdf of the random variable \X\, X is the standard normal random variable, and 

(c) 

e\ > 0, 1 < i < 2 are arbitrarily small constants independent of n. It follows then directly from the 
derivation (32) - (39) in [49] that 



P( Cw e{c^,c^})>l-e-^ 



(35) 



where is a constant dependent on e\ c ^ > 0, 1 < i < 2, c w \ c$ but independent of n. We now set 

(u) 

c w = Cw and focus on (32). Concentration analysis machinery of [49] will help us establish a "high 
probability" lower bound on f s (c w ) (this will amount to nothing but reversing the concentration arguments 
that we have established in [49]; concentration arguments are of course easy to reverse; what was harder to 
reverse was the part before (32)). We now split f g (c w ) into two parts i.e. 



fg(c w ) — fg \c w ) fg '{Cw), 



(2), 



(36) 



where fg\c w ) = Yh= Cw +i Si and fs\ c w) = (g T z)-Ei=i ft)- Now > fg\ c w) concentrates trivially, the 
argument is the same as the one that can be established when c w = (alternatively one can repeat derivation 
(37) from [49] to obtain the Lipschitz constant and combine it with Lipschitz concentration formula (35) 
also in [49]). So we have 



PifPM > (1 - e^Ef^icw)) > 1 - e-4 9 \ 



(37) 



again as usual ef ] > is an arbitrarily small constant and is a constant dependent on ef" 1 and c w but 
independent of n. On the other hand, concentration of f s (c w ) follows by reversing the (38) from [49], i.e. 



P(f { g 2 \c w ) > (1 + e ( i ] )EfP(cw)) > 1 - e~^ n 



(38) 



where again as usual e { f > is an arbitrarily small constant and ef^ is a constant dependent on and 
c w but independent of n. Combination of (32), (37) and (37) gives (the only other thing one should observe 
here is that £((g T z) - YZi ft) > °) 



E r 2 

=c w +l 



((g T z)-E-=ift) 2 



> 



n — c„ 



\ 



(l + e^^^g^-E^ift)) 2 



=c w + l 



n c w 



(g) (s) 
> (l-e" e 2 n )(l-e~ e 4 «) 



(39) 
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Now, let 



(i+4"") 2 \ 



*=Cu, + l 

(40) 

where > is an arbitrarily small constant. Combining (30), (32), (35), (39), and (40) we have 

P( min (||g+— L z || 2 ) > ||h|| 2 +e^^) > (l-e-4 m)m )(l- e -4 9) ™)(i_ e -4 9) ™)(i_ e -4 c) ™). 
zeZ,v<=R n \0 1Mb 

(41) 

Further combination of (28), (29), (30), and (41) gives us that if m = m w 
min max (-a T Aiy+a T z+\\iy\\ 2 (g-e{ 9) ^)) > 0) > (l- e -4 m)m »)(l- e -4 9) ™)(i_ e -4 9) ™)(i_ e -4 c) ™). 

z£Z,veRp\0 ||a|| 2 =l 

(42) 

Since P( 5 < ^/n) > 1 - e~ e 6 n (where is, as all other e's in this paper are, independent of n) from 



(42) we finally have 



P( min max (-& T Av+a T z) > 0) > (l- e -4 m)m »)(l- e -4 9) ™)(i_ e -4 9) ™)(i_ e -4 9) ™)(l_ e -4 c) ™ 
zeZ,ueR n \0 ||a|| 2 =l 

(43) 

Connecting (27) and (43) we obtain 

p(-t(a) > o) > (i - e -4 m)m »)(i - e -4 9) «)(i _ e -4 9) «)(i _ e -4 9) ™)(i _ e -4 c)n ), 

and ultimately 

lim P{t{A) < 0) = lim (1 - e -4 m) ™-)(l _ e -4 9) ™)(i _ e "4 9) ™)(i _ e -4 9) ")(i _ e "4 c) ™) = i (44) 

n— too n— ¥oo 

which is what we established as a goal in (12). We summarize the results in the following theorem. 

Theorem 3. ( Exact weak threshold) Let Abe an mxn matrix in(l) with i. i. d. standard normal components. 
Let the unknown x in (1) be k-sparse. Further, let the location and signs of nonzero elements of x be 
arbitrarily chosen but fixed. Let k, m, n be large and let a = — and f3 w = — be constants independent of 
m and n. Let erfinv be the inverse of the standard error function associated with zero-mean unit variance 
Gaussian random variable. Further, let all e 's below be arbitrarily small constants. 

1. Let 9 W , (f3 w < 9 W < 1) be the solution of 

R - {erfinv (f^)) 2 

(1 - £ < C >)(1 - V2erfinv((l + = 0. (45) 

"w -t Pw 

If a and (3 W further satisfy 



a > 



I e (erfinv(^tW 1 - (3 W 1 



2 -(erfinv 



(46) 

then with overwhelming probability the solution of (2) is the k-sparse xfrom (1). 
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2. Let 9 W , (fi w < 6 W < 1 ) be the solution of 

K- { erfinv{y^)f _ 
(1 + 4 C) )(1 - (3 W )^ V2erfinv((l - ej )^—^) = 0. (47) 



If on the other hand a and (3 W satisfy 
( 



a < 



(1 + e\ 



( m )\2 



/ — i |/ r a \ [2 -(erfinv(±=^-)) 



2 



V 



then with overwhelming probability there will be a k-sparse x (from a set of x 's with fixed locations 
and signs of nonzero components) that satisfies (1) and is not the solution of (2). 

Proof. The first part was established in [49]. The second part follows from the previous discussion combin- 
ing (4), (1 1), (12), (34), (40), and (44). □ 

While the previous theorem insists on precision one can do what we will refer to as the "deepsilonification" 
and obtain a way more convenient characterization. After removing all e's (or say after setting them to the 
values that are so small when compared to n that on any available finite precision machine they don't impact 
the above characterization) one in a more informal language then has. 
Assume the setup of the above theorem. Let a w and f3 w satisfy the following: 
Fundamental characterization of the i\ performance: 



(erfinv(l^ 



| (1~^) V ^ aw -^/2erfinv(^) = .| (49) 

Then: 

1. If a > a w then with overwhelming probability the solution of (2) is the fc-sparse x from (1). 

2. If a < a w then with overwhelming probability there will be a /c-sparse x (from a set of x's with fixed 
locations and signs of nonzero components) that satisfies (1) and is not the solution of (2). 

As stated above equation (49) is the fundamental characterization of the l\ performance. Numerical 
values of the weak threshold obtained using (49) were presented in [49]. As it was demonstrated there, 
the lower bounds on the thresholds were in an excellent numerical agreement with the optimal thresholds 
computed in [15, 16]. Theorem 3 establishes that the lower bounds computed in [49] (essentially those 
one can compute from (49)) are actually the upper bounds as well and as such are the exact values of the 
weak thresholds. Moreover, in a companion paper [47] we established a qualitative equivalence of the 
characterization given in (49) and the results obtained in [16]. It is rather fascinating to us how well the 
axiomatic system of mathematics works and that two so seemingly different approaches, the geometric one 
from [16] and the purely probabilistic one from [49], result in exactly the same optimal characterization of 
the performance of l\ -optimization. 

Out of respect for an incredible effort that was put forth to characterize the i\ performance in [15, 16, 
47,49] we present in Figure 3 again the plot obtained based on the ultimate characterization (49). 
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Figure 2: Weak threshold, i\ -optimization — ultimate performance 

4 Upper-bounding (3 W - signed x 

In this section we specialize the results from the previous section to the recovery of vectors x with elements 
known to have certain sign pattern. Without loss of generality we assume that it is known that Xj > 0, 1 < 
i < n. We also again assume that x is A;-sparse, i.e. we assume that x has no more than k nonzero 
elements. To solve (1) for such an x instead of (2) we consider the following optimization problem (see, 
e.g. [15,17,49]) 

min ||x||i 
subject to Ax = y 

X; > 0. (50) 

In what follows we will determine the upper bound on the weak threshold that characterizes the performance 
of the above algorithm. Before proceeding further we quickly recall on and readjust the definition of the 
weak threshold. The definition of the weak threshold was already introduced in Section 1 when recovery 
of general signals (vectors) x was considered. Here, we slightly modify it so that it fits the scenario of a 
priori known sign patterns of elements of x. Namely, for a given a, is the maximum value of /3 such 
that the solution (50) is the /3n-sparse solution of (1) for any given /3n-sparse x with a fixed location of 
nonzero components and a priori known to be comprised of non-negative elements. Since the analysis will 
clearly be irrelevant with respect to what particular location of nonzero elements is chosen, we can for the 
simplicity of the exposition and without loss of generality assume that the components xi,X2, . . . ,x ra _yt 
of x are equal to zero and the components x n _fe +1 , x n _fc +2 , . . . , x n of x are greater than or equal to zero. 
Under this assumption we have the following (see e.g. [48]) "signed" analogue to Theorem 1. 

Theorem 4. (Nonzero part ofx has a fixed location; The signs of elements ofx are a priori known) Assume 
that an m x n matrix A is given. Let x be a k-sparse vector whose nonzero components are known to be 
positive. Also let x\ = x 2 = • • • = x n _& = 0. Further, assume that y = Ax and that w is an n x 1 vector. 
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If 

n n—k 

(Vw g R n \Aw = 0,Wj > 0, 1 < i < n- k) - ^ w; < ^ (51) 

i=n— fc+1 i=l 

?Zzera ?Zie solution of (50) is x. Moreover, if 

n n—k 

(3w G /? n |Aw = 0,Wi > 0, 1 < i < n- fc) - > ^ w< (52) 

i=n— fc+1 i=l 

?Zie« ?Ziere w/ZZ a k-sparse nonnegative x ?Zia? satisfies (1) and is not the solution of (50). 

Below we probabilistically analyze validity of the null-space characterization given in the second part of 
Theorem 4. Essentially, we will design a mechanism for computing upper bounds on /?+ (in fact, as it was 
the case in the previous section, since it will be slightly more convenient we will actually determine lower 
bounds on a; that will of course again be conceptually the same as finding the upper-bounds on (3). 

We start by defining a quantity r + which will be an analogue to r from the previous section and will 
play one of the key roles below 

n—k n 

T + (A) = min w; + ^ w { ) 

i=l i=n— fc+1 

subject to ^4w = 

Wj>0, 1 < i < n — k 

||w|| 2 < 1. (53) 

We will continue to assume that the entries of A are i.i.d. standard normal random variables. Similarly to 
what was established in (12) we have that one can say that for any a and /3 for which 

lim P{t + (A) < 0) = 1, (54) 

n^oo 

there is an a priori known to be nonnegative fc-sparse x (from a set of x's that have given fixed location of 
non-zeros) that satisfies (1) and which (50) with probability 1 fails to find. For a fixed (3 our goal will be to 
find the largest possible a for which (54) holds, i.e. for which (50) fails with probability 1. 

As it was the case in the previous section, before going through the randomness of the problem and 
evaluation of P(t + (A) < 0), we will first try to provide an expression for r + that is a bit more explicit 
than the one given by the optimization problem in (53). To facilitate following and exposition the rest of the 
analysis will parallel as much as possible what was presented in the previous section. To that end we start 
by writing the Lagrange dual of the optimization problem in (53) (as it was the case in the previous section, 
the strong duality trivially holds throughout the rest of this derivation). After regrouping the terms one has 

n—k n n—k n 

t + (A)= max min w« + w« — A^Wj + v T Aw + 7 w, 2 — 7 

' ' 7 i=l i=n-k+l i=l i=l 

A, (1) > 0, 1 < i < n - k 

7 > 0. (55) 
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At this point we proceed by solving the inner minimization over w. To that end, let 

n n—k n 

/+(A(V,7,w)= Y, w 1 + ^(1-AJ 1) )w 1 + 1 / t Aw + 7 ^w 1 2 - 7 . (56) 

i=n~k+l i=l i=l 

Since ff(-) is convex in w we simply find the optimal w by equaling the derivative of ff (•) with respect 
to w to zero. We then have 

— ± ; = (l-A, w ) + z/ Ai + 27Wj, 1 < i < n - k 

-±L± — ' h ' = 1 + v T Ai + 2 7 wj, n-k+l<i<n (57) 
dwi 

where as earlier A4 is the 2-th column of A. Let 

z+ = [(1 - A^), (1 - A«), . . . , (1 - A^), 1,1,..., If. (58) 
From (20) one easily finds 



z 1 



A T v 



<t = " 27 " ' • (59) 
Removing the inner minimization over w in (55) and recognizing the relation between z + and A^ we have 

t+{A) = max (z - A T vfwf t + tIIw+JH - 7 

z+,1/,7 

subject to z+ < 1, 1 < i < n — k. 

zf = l,n — k + 1 < i < n 

7 > 0. (60) 
Finally after plugging w f pt from (59) in (60) we obtain 

-t- / A \ 11^^ ^ ^ 1 1 2 

t t A = max 7 

z+,1/,7 4 7 

subject to zf < 1, 1 < i < n — k. 

zf = l,n — k + 1 < i < n 

7 > 0. (61) 
The maximization over 7 is then trivial and one finally has 

r + (,4)=max -||z + - A T u\\ 2 

z+ ,v 

subject to zf < 1, 1 < i < n — k 

z+ = 1, n - k + 1 < i < n. (62) 
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or in a more convenient form 



T + (A) = -min ||z + - A T u\\ 2 

z+ ,u 

subject to z+ < 1, 1 < i < n — k 

2+ = l, n - k + 1 < i < n. (63) 

At this point we are again almost ready to switch to the probabilistic aspect of the analysis. Again, the last 
piece of transformation is to rewrite (63) as 

t + (A) = — min max a T (z + — A T v) 

z+ ,v ||a||2=l 

subject to z^<l,l<i<n — k 

z+ = 1, n - k + 1 < i < n. (64) 

Now we are ready to invoke the results from Theorem 2. We do so through the following lemma which is a 
slightly modified Lemma 1 which itself is a slightly modified version of Lemma 3.1 from [29]. 

Lemma 2. Let A be an m x n matrix with i.i.d. standard normal components. Let g and h be n x 1 and 
m x 1 vectors with i.i.d. standard normal components. Also, let g be a standard normal random variable 
and let Z + be a set such that Z + = (z + G -R ra |z+ = 1, n — k + 1 < i < n and z+ < 1, 1 < i < n — k). 
Then 



min max (-a T J 4z/+|H| 2 Sf-Ca,z+, 1 ,) > 0) > P( min max (\\u\\ 2 V gjaj+V lw-C az+i ,) > 

-€Z+,veR n \0\\a\\ 2 =l z+eZ+,veR n \0 ||a|| 2 =l 

(65) 



Proof. The proof is again exactly the same as the one of Lemma 3.1 in [29]. The only difference is that one 
should make identical copies over z of the processes X X:V , Y XiV defined in that proof. The rest of the proof 
remains unaltered. □ 

Let Caz+ v = ^V^lWlh ~ aTz+ with > being an arbitrarily small constant independent of n. 
The left-hand side of the inequality in (65) is then the following probability of interest 



P( min max (\\v\\ 2 Y^g^ + V] - e{ g) y/n\\v\\ 2 + a T z + ) > 0). 

z+ez+,u€R n \o ||a|| 2 =i r—f r— f 

i=\ i=i 

One can then repeat the steps until (30) from the previous section and obtain 

n m 

P( min max (||t/|| 2 y^gjaj + y^hjZ/j - el 9) \/ra||^||2 + a T z + ) > 0) 

z+&Z+,uGR n \0 ||a|| 2 =l ^— ' r— f 

1=1 i=l 

= P( min (||g + — *— z+|| 2 ) > ||h|| 2 + 4 9) V^) 

> (i _ e -4 m) ™) P ( min (|| g + _L z +|| 2 ) > (i + e ( m ))^ + e (f)^ )) _ (66) 

z+eZ+,u£R"\0 f 2 



Now, let g+ = [g+ )} g+j, . . . , gj_ fc) ,&,-fc+i, gn-fc+2, ■ ■ ■ , g«] T , where [g+ } , g+ } , . . . , ^ n _ k) . 



are 
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[gi>g2) • • • j gn-fc] sorted in increasing order. Then clearly, 

min (|| g + J_ z +|| 2 ) = min (|| g + _ _L Z + || 2 ) (67) 

z+ez+,ueR n \o \Wh z+e\z+\,ueR n \o |M| 2 

Moreover, the optimization on the right-hand side of (66) is again structurally the same as the one in equation 
(15) in [49] (again, to be completely exact, it is the same as the nonnegative weak threshold equivalent to 
(15)). Essentially, the exact equivalence between these optimizations is achieved if in (15) from [49] h 
is replaced by g + , v is replaced by jnjjrj, A is restricted to the lower (n — k) components, z is replaced 

by z + , and one recalls that we earlier introduced simplification zf = 1 — A^, 1 < i < n - fc,A (1) > 
(that would in essence be the nonnegative weak threshold equivalent to (15); similarly to what we have 
mentioned in the previous section, the nonnegative weak threshold equivalent to (15) was not explicitly 
written anywhere in [49] but is rather obvious; of course, as mentioned earlier, in [49] we made a nonnegative 
"weak" equivalence to (29) instead). With these replacements one can then use the machinery of [49] to 
establish 



min (||g + - — — z+|| 2 ) 

z+£Z,v£R n \0 \\v\\2 



where c+ is the solution of 



\ 



£ (tt) 2 - m+)Tz) (68) 



i=cj,+l 



(g T z) " £■=! Si 



g r+ - (69) 



71 C>uj 

We recall again, that the key point to success of our method is that the derivation of [49] establishes equalities 
in (68) and (69). At this point we have established the core of our upper-bounding arguments for the 
nonnegative case. The rest would be just a repetition of the derivations done in the previous section that 
would make everything precise. After replacing g by g + and repeating literally every step of the derivation 
in the previous section from (34) until (42) one arrives to a "nonnegative" equivalent to (42) 

min max (-SL T Av+SL T z + +\\u\\ 2 (g-e^ l) y/n)) > 0) > {l-e~ e ^ )mw ){l-e- e ^ n )(l-e" e ^ n )(l-e 
z+eZ+,ueR n \0 ||a|| 2 =l 

(70) 

Repeating then the last piece of argument related to P(g < e^y/n) > 1 — e~ e s n one then arrives at 

("0 (s) (9) (9) ( c ) 

P{-t + {A) > 0) > (1 - e~ £ 2 W )(l " z 2 )(1 - z A )(1 - e 6 )(1 " e ' 3 ), 

and ultimately at 

lim P{t + {A) < 0) = lim (1 - e^" 1 ™)^ - e -^ )n ){l - e-^ )n ){\ - e- e< ^ n )(l - e'^ 71 ) = 1, (71) 

n— too n— >oo 

which is what we established as a goal in (54). We summarize the results in the following theorem. 

Theorem 5. (Exact weak threshold — signed x) Let Abe anmxn matrix in(l) with i.i.d. standard normal 
components. Let the unknown x in (I) he k-sparse and let it be a priori known that its nonzero components 
are positive. Further, let the location of the nonzero elements ofx be arbitrarily chosen but fixed. Let k, m, n 
be large and let a = — and /3+ = - be constants independent of m and n. Let erfinv be the inverse of the 
standard error function associated with zero-mean unit variance Gaussian random variable. Further, let all 
e 's below be arbitrarily small constants. 
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1. Let 0+, f/3+ < 0+ < 1) be the solution of 



a < 



(l- e i c) )(l -/?+)- 
If a and /3+ further satisfy 

( 



— -(erfinv{2^f-i)) 2 

2^ e 



(c) 



V2erfinv((2 {1 + e i ° J - 1)) = 0. (72) 

J- Pw 



a > 



2{erfinv{2^-\))^ 



\ 



{ erfinv{2^H-i)Y 



I 1 - Pw)\ 2¥ e ^ 



(73) 



overwhelming probability the solution of (50) is the positive k-sparse xfrom (1). 
2. Let (/?+ < 9+ < 1) be the solution of 

— -(erfinv{2 1 -^-i)f 

f> -L Pm 



(i+4 c) )(i-^) V2 ^ 



/forc o?/ier /ia«ci a and satisfy 
( 



(1 + 4' 



( m )\2 



(l- e <»>)(fl+ + 



(1-/U) 



(erfinv(2^ - l)) 2 ( (* " # 



-^nv((2 (1 e f )(1 + ^ ) -l)) = 0. (74) 
1 — Pmj 



, , r— -{erfinv{2^t -i)) 2 



27r (erfinv(2^f-i)) 2 

e i—Pw 



^a + 4 9) )- 2 



(75) 



w/f/i overwhelming probability there will be a positive k-sparse x (/rom a set of it's with fixed 
locations of nonzero components) that satisfies (1) and is not the solution of (50). 



Proof. The first part was established in [49]. The second part follows from the previous discussion combin- 
ing (52), (53), (54), (68), (40), and (71) and the corresponding derivation from the previous section. □ 

As it was the case in the previous section, the previous theorem insists on precision and involves "epsilon" 
type of characterization. However, one can again do what we, in Section 3, referred to as the "deepsilonifi- 
cation" and obtain a way more convenient characterization. After removing all e's one in a more informal 
language then has. 

Assume the setup of the above theorem. Let a+ and /3+ satisfy the following: 

Fundamental characterization of the l\ performance (x in (1) a priori known to be positive): 



(1-/3+) 



(erfmv(2— 



-i)r 



V2erfinv(2±=si _ i) = . 

1 Pw 



(76) 



Then: 
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1. If a > a+ then with overwhelming probability the solution of (50) is the a priori known to be positive 
/c-sparse x from (1). 

2. If a < a+ then with overwhelming probability there will be an a priori known to be positive fc-sparse 
x (from a set of x's with fixed locations of nonzero components) that satisfies (1) and is not the 
solution of (50). 

As stated above equation (76) is the fundamental characterization of the l\ performance when applied to 
recovery of vectors x that are a priori known to have positive (in general of any sign) nonzero components. 
Numerical values of the "nonnegative" weak threshold obtained using (76) were presented in [49]. As 
it was demonstrated there, the lower bounds on the thresholds were in an excellent numerical agreement 
with the optimal thresholds computed in [17, 18]. Theorem 5 establishes that the lower bounds computed 
in [49] (essentially those one can compute from (76)) are actually the upper bounds as well and as such are 
the exact values of the weak thresholds. Moreover, as it was the case for the general vectors x from the 
previous section, in a companion paper [47] we established a qualitative equivalence of the characterization 
given in (76) and the results obtained in [17]. Again in a rather fascinating way the axiomatic system of 
mathematics works so well that two seemingly different approaches, the geometric one from [17] and the 
purely probabilistic one from [49], result in exactly the same optimal characterization of the performance of 
li -optimization. 

We one more time, out of respect for an incredible effort that was put forth to characterize the l\ perfor- 
mance in [17, 18,47,49] present in Figure 3 the plot obtained based on the ultimate "nonnegative" charac- 
terization (76). 



Weak threshold, ^-optimization, signed x 
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Figure 3: Weak threshold, l\ -optimization — ultimate performance 



5 Discussion 

In this paper we considered under-determined linear systems of equations with sparse solutions. We looked 
from a theoretical point of view at a classical polynomial-time t\ -optimization algorithm. Under the as- 
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sumption that the system matrix A has i.i.d. standard normal components, we derived upper bounds on 
the values of the recoverable weak thresholds in the so-called linear regime, i.e. in the regime when the 
recoverable sparsity is proportional to the length of the unknown vector. Obtained upper bounds match the 
corresponding lower bounds we found through a framework designed in [49]. Combination of the mecha- 
nism from [49] and the one that we presented in this paper is then enough to provide an explicit ultimate 
characterization of the success of i\ optimization when applied in solving under-determined systems of 
linear equations with sparse solutions. 

Further developments are pretty much then unlimited (though, of course their scientific value will never 
match the one of the results presented in [49] and here). Namely, we hardly ever encountered a "sparse 
recovery" type of the problem where the lower-bounding technique from [49] was not exact. The mechanism 
that we designed in this paper then helps to make all the success of [49] ultimate, i.e. it helps to prove what 
ultimately can be proved for this type of optimization problems. 

Various specific problems that have been of interest in a broad scientific literature developed over the 
last few years, like quantifying the performance of £± type of optimization problems in solving systems with 
special structure of the solution vector (block-sparse, binary, box-constrained, low-rank matrix, partially 
known locations of nonzero components, just to name a few), systems with non-exact (noisy) solution 
vectors and/or equations can then easily be handled. In a few forthcoming companion papers we will present 
some of these applications. However, as it will be clear when these results appear, each of them will require 
some work to put the mechanism forth but in essence they all will be fairly simple extensions of what we 
presented in [49] and here. The heart of it all will really be the lower-bounding mechanism designed in [49] 
and the complementary upper-bounding mechanism designed in this paper and how the two ultimately meet 
in a somewhat magical way. 
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